Stock market is considered the primary indicator of a country’s economic strength and development. Stock Market prices are volatile in nature and are affected by factors like inflation, economic growth, etc. Fluctuating stock market affects the investor’s belief and thus there is a need to predict the future stock maket. A stock market index, is an index that measures a stock market, or a subset of the stock market. They provide a big-picture view of whether stock prices are generally moving up, down, or sideways from moment to moment, and by how much. They help investors compare current price levels with past prices to calculate market performance. It is computed from the prices of selected stocks.
The objective of this project is to study the stock market index with quantiative analysis and predict the stock market in order to make more informed and accurate investment decisions. Three stock market index–Dow Jones Industrial Average, S&P 500 and Nasdaq Composite-in the past ten years are surveyed. In the United States, these are the three most broadly followed indexes by both the media and investors. The media most often reports on the direction of the top three indexes regularly throughout the day with key news items serving as contributors and detractors.
We will begin with studying and comparing the three index technically with Explonatory Data Analysis. We will conduct some baseline analysis, including introducing what does each index represented, analyzing historical data, and comparing their trading volume, then we will dive deeper to study the risk, by calculating returns, comparing volatilities, and finding the correlation between each index. Then we build two time series models –ARIMA model and Volatility model– to predict the index. ARIMA is used for the forecast of closing prices. For each index, several ARIMA models with different parameters are compared based on methodologies, efficiency and prediction results, and then it is represented in the form of a Graph. Different types of Volatility models is build to analyze volatility, such as garch and stochastic volatility, Along with it, we get some new insights from these indexes.
#load packages
library(ggplot2)
library(xts)
library(dygraphs)
library(quantmod)
library(dplyr)
library(SparkR)
library(scales)
library(zoo)
library(gridExtra)
library(grid)
library(forecast)
library(PerformanceAnalytics)
library(corrplot)
library(GGally)
DJI <- read.csv("DJI.csv")
NASDAQ <- read.csv("NASDAQ.csv")
SP500 <- read.csv("SP500.csv")
head(DJI)
head(NASDAQ)
head(SP500)
summary(DJI)
## Date Open High Low
## 2009-01-02: 1 Min. : 6547 Min. : 6710 Min. : 6470
## 2009-01-05: 1 1st Qu.:12267 1st Qu.:12318 1st Qu.:12199
## 2009-01-06: 1 Median :16458 Median :16529 Median :16377
## 2009-01-07: 1 Mean :16804 Mean :16888 Mean :16716
## 2009-01-08: 1 3rd Qu.:20780 3rd Qu.:20844 3rd Qu.:20708
## 2009-01-09: 1 Max. :28675 Max. :28702 Max. :28609
## (Other) :2761
## Close Adj.Close Volume
## Min. : 6547 Min. : 6547 Min. :8.410e+06
## 1st Qu.:12272 1st Qu.:12272 1st Qu.:1.061e+08
## Median :16459 Median :16459 Median :1.641e+08
## Mean :16809 Mean :16809 Mean :1.986e+08
## 3rd Qu.:20790 3rd Qu.:20790 3rd Qu.:2.695e+08
## Max. :28645 Max. :28645 Max. :2.191e+09
##
summary(NASDAQ)
## Date Open High Low Close
## 2009-01-02: 1 Min. :1285 Min. :1316 Min. :1266 Min. :1269
## 2009-01-05: 1 1st Qu.:2756 1st Qu.:2774 1st Qu.:2744 1st Qu.:2761
## 2009-01-06: 1 Median :4342 Median :4370 Median :4320 Median :4350
## 2009-01-07: 1 Mean :4478 Mean :4502 Mean :4451 Mean :4479
## 2009-01-08: 1 3rd Qu.:5875 3rd Qu.:5899 3rd Qu.:5856 3rd Qu.:5877
## 2009-01-09: 1 Max. :9049 Max. :9052 Max. :8987 Max. :9022
## (Other) :2761
## Adj.Close Volume
## Min. :1269 Min. :1.494e+08
## 1st Qu.:2761 1st Qu.:1.748e+09
## Median :4350 Median :1.936e+09
## Mean :4479 Mean :1.984e+09
## 3rd Qu.:5877 3rd Qu.:2.166e+09
## Max. :9022 Max. :4.554e+09
##
summary(SP500)
## Date Open High Low
## 2009-01-02: 1 Min. : 679.3 Min. : 695.3 Min. : 666.8
## 2009-01-05: 1 1st Qu.:1310.5 1st Qu.:1316.4 1st Qu.:1302.6
## 2009-01-06: 1 Median :1905.7 Median :1918.8 Median :1885.8
## 2009-01-07: 1 Mean :1869.2 Mean :1878.5 Mean :1859.4
## 2009-01-08: 1 3rd Qu.:2365.7 3rd Qu.:2371.0 3rd Qu.:2355.7
## 2009-01-09: 1 Max. :3247.2 Max. :3247.9 Max. :3234.4
## (Other) :2761
## Close Adj.Close Volume
## Min. : 676.5 Min. : 676.5 Min. :1.025e+09
## 1st Qu.:1310.2 1st Qu.:1310.2 1st Qu.:3.271e+09
## Median :1904.0 Median :1904.0 Median :3.664e+09
## Mean :1869.8 Mean :1869.8 Mean :3.885e+09
## 3rd Qu.:2364.3 3rd Qu.:2364.3 3rd Qu.:4.249e+09
## Max. :3240.0 Max. :3240.0 Max. :1.062e+10
##
#check NA
sum(is.na(DJI))
## [1] 0
sum(is.na(NASDAQ))
## [1] 0
sum(is.na(SP500))
## [1] 0
DJI$Date <- as.Date(as.character(DJI$Date))
NASDAQ$Date <- as.Date(as.character(NASDAQ$Date))
SP500$Date <- as.Date(as.character(SP500$Date))
DJI_xts <- xts(DJI$Close,order.by=DJI$Date,frequency=252)
NASDAQ_xts <- xts(NASDAQ$Close,order.by=NASDAQ$Date,frequency=252)
SP500_xts <- xts(SP500$Close,order.by=SP500$Date,frequency=252)
stocks <- cbind(DJI_xts,NASDAQ_xts, SP500_xts)
dygraph(stocks,ylab="Close",
main="DJI, NASDAQ, and SP500 Index Prices 2009-2019") %>%
dySeries("DJI_xts",label="DJI") %>%
dySeries("NASDAQ_xts",label="NASDAQ") %>%
dySeries("SP500_xts",label="SP500") %>%
dyOptions(colors = c("blue","brown", "darkgreen")) %>%
dyRangeSelector()
For this interactive plot above, we can zoom in to see details of stock price within a smaller period range, and zoom out to see the whole trend of the stock history. If you hover your mouse on it, you can also see the spesific price of each stock each day. In a borader picture, we can notice an upward trend of each stock index within the past 10 years. And DJI has a significanlty higher index level than that of NASDAQ and SP500. That’s because the “Dow” includes only 30 stocks, all of which are among the largest, richest and most heavily traded companies in the United States, and also becauae DJI is price-weighted, therefore, DJ is affected only by changes in the stock prices, so companies with a higher share price or a more extreme price movement have a greater effect on the Dow.
S&P 500 is more encompassing, as it is based on a larger sample of total U.S. stocks. Stocks in the S&P 500 are weighted by their market value rather than their stock prices. In this way, the S&P 500 attempts to ensure that a 10% change in a $20 stock will affect the index in the same way as a 10% change in a 50 dollor stock will.
The Nasdaq represents the largest non-financial companies listed on the Nasdaq exchange and is generally regarded as a technology index given the heavy weighting given to tech-based companies.
#date <- as.Date(as.character(DJI$Date))
date <- as.Date(DJI$Date, format = '%d-%B-%y')
DJI.vol <- DJI$Volume/1000000
NASDAQ.vol <- NASDAQ$Volume/1000000
SP500.vol <- SP500$Volume/1000000
Volumn <- data.frame(date, DJI.vol, NASDAQ.vol, SP500.vol)
Volumn <- Volumn[lubridate::year(Volumn$date) %in% c(2009:2019), ]
head(Volumn)
# labels and breaks for X axis text
brks <- Volumn$date[seq(1, length(Volumn$date), 252)]
lbls <- lubridate::year(brks)
#plot
ggplot(Volumn, aes(x=date)) + ylim(0, 12000) +
geom_area(aes(y=DJI.vol+NASDAQ.vol+SP500.vol, fill="SP500")) +
geom_area(aes(y=DJI.vol+NASDAQ.vol, fill="NASDAQ")) +
geom_area(aes(y=DJI.vol, fill="DJI")) +
labs(title = "Trading Volumn of DJI, NASDAQ, and SP500", y ="Volumn (in Million)") +
scale_x_date(labels = lbls, breaks = brks)
SP500 represents the broadest measure of the U.S. economy among the three major indices. The index includes 500 companies from all sectors of the economy with stocks listed on either the New York Stock Exchange or the Nasdaq. Combined, the companies in the S&P 500 account for about 75 percent of all U.S. stock. Since the Dow only represents 30, its total trading volumn is obvisously take a much lower portion on the market.
Now that we’ve done some baseline analysis, let’s go ahead and dive a little deeper. We’re now going to analyze the risk of the stock. In order to do so we’ll need to take a closer look at the daily changes of the stock, and not just its absolute value.
#Create a column for return
DJI$return <- Delt(DJI$Close)
NASDAQ$return <- Delt(NASDAQ$Close)
SP500$return <- Delt(SP500$Close)
date <- as.Date(DJI$Date, format = '%d-%B-%y')
DJI.return <- DJI$return*100
NASDAQ.return <- NASDAQ$return*100
SP500.return <- SP500$return*100
Return <- data.frame(date, DJI.return, NASDAQ.return, SP500.return)
Return <- Return[lubridate::year(Return$date) %in% c(2009:2019), ]
colnames(Return) <- c("date", "DJI.return", "NASDAQ.return", "SP500.return")
head(Return)
# labels and breaks for X axis text
brks <- Return$date[seq(1, length(Return$date), 252)]
lbls <- lubridate::year(brks)
#plot DJI return
DJI.retn <- ggplot(Return, aes(x=date)) + ylim(-7, 7) +
geom_line(aes(y=as.numeric(DJI.return), col="DJI.return")) +
labs(title="Time Series of DJI Returns Percentage",
y="Returns %") + # title
scale_color_manual(values = "blue") +
scale_x_date(labels = lbls, breaks = brks) # change to monthly ticks and labels
#plot NASDAQ return
NASDAQ.retn <- ggplot(Return, aes(x=date)) + ylim(-7, 7) +
geom_line(aes(y=as.numeric(NASDAQ.return), col="NASDAQ.return")) +
labs(title="Time Series of NASDAQ Returns Percentage",
y="Returns %") + # title
scale_color_manual(values = "brown") +
scale_x_date(labels = lbls, breaks = brks) # change to monthly ticks and labels
#plot SP500 return
SP500.retn <- ggplot(Return, aes(x=date)) + ylim(-7, 7) +
geom_line(aes(y=as.numeric(SP500.return), col="SP500.return")) +
labs(title="Time Series of SP500 Returns Percentage",
y="Returns %") + # title
scale_color_manual(values = "darkgreen") +
scale_x_date(labels = lbls, breaks = brks) # change to monthly ticks and labels
grid.arrange(DJI.retn, NASDAQ.retn, SP500.retn, nrow=3, ncol=1)
#Calculate volitility
StdDev(as.double(DJI.return), na.rm = FALSE)
## [,1]
## StdDev 0.9603806
StdDev(as.double(NASDAQ.return), na.rm = FALSE)
## [,1]
## StdDev 1.155326
StdDev(as.double(SP500.return), na.rm = FALSE)
## [,1]
## StdDev 1.025853
Regarding volatility, the Dow Jones is the least volatile of the three major indices as many components are slower moving, blue-chip companies such as Boeing Company, United Healthcare, and 3M Company. The Nasdaq 100 is the most volatile of the three largely because of its high concentration in riskier, high growth companies such as Facebook, Amazon, and Alphabet (Google). Volatility in the S&P 500 is typically somewhere between the two.
#plot DJI return distribution
DJI.dist <- ggplot(data.frame(DJI), aes(x=return)) +
labs(y= "Number of Days", x = "Daily Return") + xlim(-0.05, 0.05) +
geom_histogram(aes(y=..density..), colour="black", fill="white", bins = 100, title="DJI")+
geom_density(alpha=.2, fill="#FF6666") +
geom_vline(aes(xintercept=0), color="black", linetype="dashed", size=1) +
ggtitle("DJI Daily Return Distribution")
#plot NASDAQ return distribution
NASDAQ.dist <- ggplot(data.frame(NASDAQ), aes(x=return)) +
labs(y= "Number of Days", x = "Daily Return") + xlim(-0.05, 0.05) +
geom_histogram(aes(y=..density..), colour="black", fill="white", bins = 100)+
geom_density(alpha=.2, fill="#FF6666") +
geom_vline(aes(xintercept=0), color="black", linetype="dashed", size=1) +
ggtitle("NASDAQ Daily Return Distribution")
#plot SP500 return distribution
SP500.dist <- ggplot(data.frame(SP500), aes(x=return)) +
labs(y= "Number of Days", x = "Daily Return") + xlim(-0.05, 0.05) +
geom_histogram(aes(y=..density..), colour="black", fill="white", bins = 100)+
geom_density(alpha=.2, fill="#FF6666") +
geom_vline(aes(xintercept=0), color="black", linetype="dashed", size=1)+
ggtitle("SP500 Daily Return Distribution")
grid.arrange(DJI.dist, NASDAQ.dist, SP500.dist, nrow=3, ncol=1)
Correlation can be used to gain perspective on the overall nature of the larger market. We applied pearson correlation to calculate correlation between those three stock index. Formulas that calculate correlation can predict how two stock index might perform relative to each other in the future. Applied to historical prices, correlation can help determine if stocks’ index prices tend to move with or against each other. Using the correlation tool, investors might even be able to select stocks that complement each other in terms of price movement. This can help reduce the overall risk and increase the overall potential return of a portfolio.
head(Return)
cor(Return[, -1], method="pearson", use = "complete.obs")
## DJI.return NASDAQ.return SP500.return
## DJI.return 1.0000000 0.9055958 0.9743230
## NASDAQ.return 0.9055958 1.0000000 0.9544069
## SP500.return 0.9743230 0.9544069 1.0000000
chart.Correlation(Return[, -1], method="pearson",hist.col = "#00AFBB")
The above plots show significant correlation between those three stock index. Each pair exhibit 91%, 95% and 97% degree of correlation, which means that they all moved basically in lockstep with each other. It indicates that the market in general move in the same direction, because they track companies impacted by the same business cycle and other important macroeconomic factors.
DJI.ts <- ts(DJI$Close, start=2009, frequency=252)
DJI_decomp <- decompose(DJI.ts)
plot(DJI_decomp, col = "blue")
NASDAQ.ts <- ts(NASDAQ$Close, start=2009, frequency=252)
NASDAQ_decomp <- decompose(NASDAQ.ts)
plot(NASDAQ_decomp, col = "brown")
SP500.ts <- ts(SP500$Close, start=2009, frequency=252)
SP500_decomp <- decompose(SP500.ts)
plot(SP500_decomp, col = "darkgreen")
ggseasonplot(DJI.ts) + labs(title="Seasonal plot: DJI")
ggseasonplot(NASDAQ.ts) + labs(title="Seasonal plot: NASDAQ")
ggseasonplot(SP500.ts) + labs(title="Seasonal plot: SP500")